A POMDP Approximation Algorithm thatAnticipates the Need

نویسنده

Valentina Bayer Zubek

چکیده

This paper introduces the even-odd POMDP, an approximation to POMDPs (Partially Observable Markov Decision Problems) in which the world is assumed to be fully observable every other time step. This approximation works well for problems with a delayed need to observe. The even-odd POMDP can be converted into an equivalent MDP, the 2MDP, whose value function, V 2MDP , can be combined online with a 2-step lookahead search to provide a good POMDP policy. We prove that this gives an approximation to the POMDP's optimal value function that is at least as good as methods based on the optimal value function of the underlying MDP. We present experimental evidence that the method nds a good policy for a POMDP with 10,000 states and observations.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A POMDP Approximation Algorithm That Anticipates the Need to Observe

This paper introduces the even odd POMDP an approxi mation to POMDPs Partially Observable Markov Decision Problems in which the world is assumed to be fully observable every other time step This approximation works well for problems with a delayed need to observe The even odd POMDP can be converted into an equivalent MDP the MDP whose value function V MDP can be combined online with a step loo...

متن کامل

Two Heuristics for Solving POMDPs Having a Delayed Need to Observe

A common heuristic for solving Partial ly Observable Markov Decision Problems POMDPs is to rst solve the underlying Markov Decision Process MDP and then con struct a POMDP policy by performing a xed depth lookahead search in the POMDP and evaluating the leaf nodes using the MDP value function A problem with this approximation is that it does not account for the need to choose actions in order t...

متن کامل

A Model Approximation Scheme for Planning in PartiallyObservable Stochastic

Partially observable Markov decision processes (POMDPs) are a natural model for planning problems where eeects of actions are nondeterministic and the state of the world is not completely observable. It is diicult to solve POMDPs exactly. This paper proposes a new approximation scheme. The basic idea is to transform a POMDP into another one where additional information is provided by an oracle....

متن کامل

Dialogue POMDP components (Part II): learning the reward function

The partially observable Markov decision process (POMDP) framework has been applied in dialogue systems as a formal framework to represent uncertainty explicitlywhile being robust to noise. In this context, estimating the dialogue POMDP model components (states, observations, and reward) is a significant challenge as they have a direct impact on the optimized dialogue POMDP policy. Learning sta...

متن کامل

POMDP-based Statistical Spoken Dialogue Systems: a Review

Statistical dialogue systems are motivated by the need for a data-driven framework that reduces the cost of laboriously hand-crafting complex dialogue managers and that provides robustness against the errors created by speech recognisers operating in noisy environments. By including an explicit Bayesian model of uncertainty and by optimising the policy via a reward-driven process, partially obs...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2000

A POMDP Approximation Algorithm thatAnticipates the Need

نویسنده

چکیده

منابع مشابه

A POMDP Approximation Algorithm That Anticipates the Need to Observe

Two Heuristics for Solving POMDPs Having a Delayed Need to Observe

A Model Approximation Scheme for Planning in PartiallyObservable Stochastic

Dialogue POMDP components (Part II): learning the reward function

POMDP-based Statistical Spoken Dialogue Systems: a Review

عنوان ژورنال:

اشتراک گذاری